Reading Assigned Folios and Understanding User Stories

I initially pulled out all the folios assigned to me into a pdf, printed them out, and read through them to get a sense of the different activities that were described. Having gone through the Making and Knowing project page, its twitter feed, and flickr site, I had a sense of what to expect. But as I read the folios assigned to me I was quite amazed, and even startled by some of objects that were being described. In particular, I was intrigued by the casting of crayfishes, grasshoppers, and serpents. The casting of entwined animals, especial snakes, was a theme that Professor Smith touched upon in her introductory session on the BnF Ms Fr 640 and its historical context.
To fine tune the activities that my folios were dealing with, I extracted just the heading of all the entries. They were as follows:
Decorating mirrors [...] and smaller things (p131r); Training a dog (---); Ink and molded paper (---); Molded wax (p131r-p131v); When lead or cast tin expands (p131v-p132r); Mold made with two casts (p132r); reheating molds (p132r-p132v); Common quarry sand (p132v); Filings(---); Gilding animals cast with silver (---); Hard wax to impress seals (p133r); Casting the feet of small lizards with gold and silver (---); Wiremarks which are on the head of the animal (---); Clamps and broken mold (---); Expansion and little holes in the mold (p133v); Thing that cannot be stripped from the mold (---); animals entwined (p133v-p134r); Reworking a pierced mold (p134r); Sand made of filings (p134r); To make gold fluid (p134v); Casting big works with gold (p134v); Secret for soldering small works made of gold and silver (---); Colors or sauce for gold (p135r); Softening gold (---); Casting (p135r); Vine leaf and small [...] (p135r); Casting with [gold] (p135v-p136r); Enamelling small works (p136r); Lead casting (p136v); Casting red copper (p137r); Oil [...] to make metal fluid (p137r); Clamps and broken mold(---); Soaking sand to mold flat medals (p137v-p138r); Making imitation diamonds (p138v); Molding with talc mixed with sand (---); Casting gold very finely (---); Casting led or tin (p139r-p140v); Casting wax to mold an animal that we don't have (p139v-p140v); Sulphur casting (p140v); Molding and shortening a big piece (---); Casting lead or tin into plaster (---); Molding crayfishes (p141r-p142v); Molding grasshoppers and other too delicate things (p142v); Molds (p142v); Luting molds (---); Molder from Foix (p143r); Molding turtles (---); Toad (p143r-p143v); [translation missing] (p143v); Iron filings (---); Carnations (---);
Reading through the folios and comparing them with the headings several thematics come to the fore: metalworking, wax molding, casting with different materials, color making, animal casts (crayfishes, grasshoppers, turtles, toads). I then turned to look for annotations. The following are those link to my folios:
Related Annotations:
AnnotationSpring2016_ChangClemensShi_Esmail_39v103v104r116r136r [Folder: Colormaking Spring 2016]
Annotation_Fall2014_CarlsonKatz_MoldedLetterPaper_131r[Folder:Metalworking-and- Moldmaking Fall 2014]
AnnotationSpring2015_FuZhang_TooThinThings_142v [Folder: Metalworking-and-Moldmaking Spring 2015]
Since I am so used to reading a book with well defined chapters, it was a little disorienting to realize that there are no such divisions in the manuscript. Are the folios then compiled as one flowing text? Another question related to how the manuscript might have been used: the folios assigned seem to be from a reference book for artisans, metal workers, dyers and other craftsmen. It seems like a reference book which would allow them to replicate certain procedures, almost like a manual perhaps. If these written instructions or experiences were to be used to make similar objects, or practice the same processes, why are the folios compiled together as one large manuscript without chapter divisions? If the codex was divided into separate chapters such as metalworking, color-making etc. would it not have been easier to use within a workshop?
Digital Design: User Stories
Trying to develop a user story made me think about what functionality will I be working on as far a digital edition of the codex is concerned? Given the above sections assigned to me, a question that comes to mind is, how will the end user experience this section of the book? Perhaps my understanding of the above folios could correlate with how the above processes, especially a written and illuminated form of the experiences embedded in them, can be re-experienced in its digital / interactive form. Here annotations and images from previous workshops could feed into this digital experience. But before these questions, how do I define the end-user of the digital edition? Or rather, who is the end-user? Is the general public, or a more scholarly audience (keeping in mind the future of the book that we are thinking here)? How does a more sophisticated user story emerge over time?
The following schemas for a user story seem powerful in different ways:
It seems to me that one of the most powerful aspects of a digital edition over a printed book is that it changes the way you read: you do not have to move page-wise, or even through an index. The "contents" of a digital edition instead requires different framing devices that allow a user to interact with "it".

Developing User Stories

In generating user stories I turned to several points made by Dr Lauren Kassell in her presentation on The Casebooks Project. I was particularly drawn to her suggestion that users of extremely voluminous digital editions often need to be "tutored" on how to use different features that explore the content and form of an archive like the Casebooks. I also went onto the Casebooks Project webpage to get a sense of the different kinds of exploratory data features that were in place apart from the standard search bar. Even a reader like me who is not an expert on history of medicine in Britain was able to read and explore a wide variety of historical information that complemented the digital volumes themselves. Keeping the Casebooks Project as a model I came up with the following user stories:
1.) A "How-to" for the digitally critical version of the BnF Ms Fr 640: As a scholar of the history of science and art, how do I read or use the digital edition of the BnF Ms Fr 640?
2.) For the second user story I was thinking of how to add more historical context as a wrapper around the digital edition: As a non-specialist in the history of science and art, I want adequate historical context for the digital BnF Ms Fr 640.
3.) There must be a meaningful way to integrate the historical reconstruction projects with the digital edition. More pointedly, how do we integrate the annotations produced from the lab experiments with the digital edition?: As an annotator invested in historical reconstruction, how does my work feature in the digital edition?

Introduction to Metadata and Reading Assigned Folios (Continued)

During Lab on February 3rd we came up with the below metadata structure as a baseline model. As far as I have understood, metadata is connected to our user stories, and it allows us to design features for the digital edition of BnF Ms Fr 640. The metadata structure is as follows:
Identifier, heading, image_url, folio_start, folio_start_r_or_v, folio_end, folio_end_r_or_v, activity, ingredients, number_of_ingredients, annotation1_title, annotation1_url, annotation2_title, annotation2_url, annotation3_title, annotation3_url, place_names, person_names, product, foreign_language.
Additional Metadata Fields? Preserving information from the manuscript
With such metadata what features are we going to implement? What additional metadata fields do we require? Apart from ingredients do we need a tools metadata field? For now I have included tools along with ingredients. Sometimes a recipe includes information beyond the process of making an object. For example in the recipe “casting with tin and lead” we have the following line: “Square molds are made of earth or blades of copper, or iron, or wood covered with white iron, in order to bury more easily these aforementioned molds between the thin sheets of copper, estric or one of iron.” This is information beyond the scope of the recipe. Do we preserve it?
Sense of a Codex
As I fill the metadata fields, a question: why do we need folio_start(r/v) and folio_end (r/v)? A natural starting point for an answer would be that recipes might move from r to v across folios (and this is how I have used these metadata fields). However in a digital edition why do we need to reproduce the form of a codex with folios? If we were truly digital would we not move away from the sense of turning a page? When marginalia in the folio appear, I chose to include ingredients that were mentioned. Should a separate metadata field be added for marginalia? Again, this would preserve the sense of a folio or a codex.
The first person voice/subjectivity in the manuscript
As I read the recipes there is often an “I” telling the reader his/her experience of making the concerned object. Who is this “I”, and is it possible to structure a narrative or digital functionality around him/her? After all, the first person motif is a recurring theme. There is a certain subjectivity, if you will, at play over here, and it may get lost in the recreation of the manuscript. How do we preserve/re-create it?

Re-configuring Metadata

This week we were trying to reconfigure our metadata structure in way that could represent the rich information that all of us have in our respective folios. After some back and forth we retained certain fields while we modified others. In particular, the following fields were added:
material
tool
purpose
subject_activity_keywords
“Material” was introduced instead of “Ingredients”, while “tool” was newly added. “Purpose” was previously “product”, and “subject_activity_keywords” was just “activity”. Having worked with the folios this week, these modified metadata fields definitely capture more information from the folios I was working with.
Since this is an iterative process, I also realize these metadata fields are subject to change until unanimously fixed at some point in the near future. To continue thinking whether these fields can be improvised, a difficulty I faced while using the “material” metadata in particular was when a folio entry had very few materials or none at all. In effect, a large amount of the data present in such an entry was then lost since it often was giving advice and enumerating maintenance procedures rather than describing a materially rich making process. When I reached an entry from my folio which was giving advice on how to remove a toad from a mold, there was more description of procedure, or steps in a process. Such an entry did not have a lot of different materials which could be extracted and formed as metadata. In another similar entry which was giving advice on cleaning iron filings, there was no materials involved at all. One way of formally including such advice or descriptive data could be to modify the “material” metadata to “material_description”. This would allow us to have a multi-value metadata field which could include “descriptions” for more procedural activities (advice at large) and the more standard recipe entries.
I also opened my ViewShare account and began exploring the consolidated folio file (which is basically all the metadata sheets that everyone in class has been working on put together). As far as I have understood ViewShare allows you to get a macro, or “bird-eye view” of the kind of data you are working with, in the sense, if the data is standardized, or if each field has values which can be clustered together, ViewShare allows you to represent your data through different filters (Lists, Tables), and different graphical formats (Pie charts, Maps, Histograms). Going back to the question of having entries with no materials, I tried to see if I could draw out all the folio entries from the consolidated file which had no materials in them using a Table view. In other words, I created a table (apart from the list we created in class on ViewShare) to explore different ways in which I could filter the entries based on materials.

Calibrating Metadata Table and Working with ViewShare

This week we were trying to reconfigure our metadata structure in way that could represent the rich information that all of us have in our respective folios. After some back and forth we retained certain fields while we modified others. In particular, the following fields were added:
material
tool
purpose
subject_activity_keywords
“Material” was introduced instead of “Ingredients”, while “tool” was newly added. “Purpose” was previously “product”, and “subject_activity_keywords” was just “activity”. Having worked with the folios this week, these modified metadata fields definitely capture more information from the folios I was working with.
Since this is an iterative process, I also realize these metadata fields are subject to change until unanimously fixed at some point in the near future. To continue thinking whether these fields can be improvised, a difficulty I faced while using the “material” metadata in particular was when a folio entry had very few materials or none at all. In effect, a large amount of the data present in such an entry was then lost since it often was giving advice and enumerating maintenance procedures rather than describing a materially rich making process. When I reached an entry from my folio which was giving advice on how to remove a toad from a mold, there was more description of procedure, or steps in a process. Such an entry did not have a lot of different materials which could be extracted and formed as metadata. In another similar entry which was giving advice on cleaning iron filings, there was no materials involved at all. One way of formally including such advice or descriptive data could be to modify the “material” metadata to “material_description”. This would allow us to have a multi-value metadata field which could include “descriptions” for more procedural activities (advice at large) and the more standard recipe entries.
I also opened my ViewShare account and began exploring the consolidated folio file (which is basically all the metadata sheets that everyone in class has been working on put together). As far as I have understood ViewShare allows you to get a macro, or “bird-eye view” of the kind of data you are working with, in the sense, if the data is standardized, or if each field has values which can be clustered together, ViewShare allows you to represent your data through different filters (Lists, Tables), and different graphical formats (Pie charts, Maps, Histograms). Going back to the question of having entries with no materials, I tried to see if I could draw out all the folio entries from the consolidated file which had no materials in them using a Table view. In other words, I created a table (apart from the list we created in class on ViewShare) to explore different ways in which I could filter the entries based on materials, or a lack of them. A question that lingers: if ViewShare is a platform which allows us to visualize standardized data, what meaningful visualizations can we create from the metadata we have at hand

ViewShare and Introduction to Text Processing

I went back to Viewshare and created a table widget for the consolidated metadata table we created. As I understand it, we created a metadata table from the folios assigned to us; such a process breaks down our folios into its critical components such as entries, ingredients, tool etc. Thus we have converted the folios into more analyzable data. Viewshare seems to be more conducive to working with numeric or discrete data rather than text. This week we also began familiarizing ourselves with the command line as a text processing tool, as compared to the previous week when we explored its file management capabilities. My sense is that the command line is more powerful than Viewshare as far as mining numeric and textual data is concerned. With Viewshare we have been trying to visualize our metadata in different ways, such as pie-charts, histograms and lists. The command line on the other hand allows us to play and “see” our metadata in a more granular way. For example, we can list all entries which have no materials, we can also perform other textual transformations such as eliminating stop words such as connecting parts of speech, conjunctions, pronouns etc. from each folio. To my understanding, Viewshare does not provide such functionalities. Working on the command line seems to be a more powerful method as far as textual analysis is concerned.
Milligan and Baker, “Introduction to the Bash Command Line,” especially working through the examples were crucial in becoming familiar with the command line. During lab we also started learning the fundamentals of working in GitHub. We learned how to clone a repository, how to check which files have not been “pushed” to remote repository, and how to update a repository.

Text processing Continued

I went through the examples in Milligan and Baker once again. After this I explored the I/O redirection operators “>” “>>” and “|”. Using “>>” or the append operator I created a consolidated file with all the folios assigned to me. The “|” operator, or piping operator, is a particularly powerful operator since it allows us to combine different command line functions. For example, we can count the number of files in a folder simply by combining the “ls” command with the “wc” command. If we wanted to find the number of files in a folder called “project”, all we have to do after “pathing” into the folder would be: ls -a | wc -l This command “pipes” the list of files in “project/” into “line count” functionality of “wc”, thus giving us the no. of files in “project/”. In class we began learning more advanced command line techniques using translate “tr” and “grep”. An illustrative example was when we generated a frequency count of the words in a file through the command: tr -sc ‘A-Za-z’ ‘\n’ < “text-file” | sort | uniq -c. The algorithm for this command which counts the no. of words in a file is as follows: first “squeeze” all the non-alphabets occurring in the file into a newline character, this establishes a list of words that occur in the file. Then sort this list, and eliminate repetitions. This gives us the frequency of each unique word that occurs in a file. We also created a bi-gram of an sample folio. We also explored the option of removing stop-words from the frequency list. Having learned these more advanced text processing techniques, reading “Unix for Poets” became a more productive process, I was able to work through some of the questions posed in that presentation. Viewing the video on regular expressions also helped develop intuition about the versatility of regular expressions.

Minimal Computing

Minimal computing reduces hardware and algorithmic dependencies. It also reduces the entry barrier, or high costs, involved in using proprietary hardware and software. As a method, it attempts to respond to corporate dominance in online as well as print publishing. More pointedly, minimal computing methods reduce server time, maintenance costs, and reduces the use of internet bandwidth. That said, sufficient time has to be invested in learning and become familiar with respective digital environments. If I understood Alex correctly, we will be creating static sites for our minimal edition. Alex explained static sites by suggesting how they are different from dynamic sites that use html and xml. Sites that use html and xml are produced on the fly; 90% of the internet is dynamically generated from databases and through security flows. Static sites on the other hand are not dynamically created. When rendered by Jekyll, such sites are browser dependent. “Ed” is a theme for “Jekyll”. YAML is used to create .md files that also structure the layout. Markdown is a reduced form of html. Jekyll interprets both .yaml and .md files. Liquid is used to generate the logic or algorithm of “where these static sites should go”. During lab this week, we installed Atom, and started converting folios assigned to us into entries. The entries were saved as .md files, and we experimented with basic markup that github understands such as different heading fonts and image elements. While converting folios into entries I had to spend a lot of time trying to deal with extremely long entries. I encountered three entries that spanned three folio pages. These were:
p139r_1 casting with lead and tin
p139v_1 casting an animal one has not got
p141r_1 casting a crayfish

Transforming XML to Markdown, and revisiting Text Markup and Layout

Apart from marking up our entries this week, as we went through them we also thought through modifications that could be made to the xml elements. While reading certain entries I came across references to other entries-- a cross reference you could say. As I was thinking of modifications to our tag set a question came to mind: how do we mark cross references between entries? Also, often marginalia that were marked as distinct entities seemed to flow into each other quite naturally, this even at the level of language and tense. This made me wonder if we could add an element called “related-block”. At this point I have come across “related-blocks” that are both on the same folio page, and across folios. I introduced the tag “related-block” to also wrap the main body of an entry. This is particularly useful when entries span multiple pages. “Related-block” can create a semantic whole to the main-body of each entry as it span multiple folios. Another question that came to mind while thinking of creating “consolidated entries” was the handling of marginalia. But first, by “consolidated entries” I mean a long entry which spans multiple folio pages in the codex, however in the digital edition we have only “one” page (i.e. the layout) where all the folio pages of such a long entry have to be “consolidated” or “arranged”. How do we consolidate the main body of a really long entry? Apart from consolidating the main body of an entry across folios, how do we handle marginalia? Where will marginalia be placed in the final layout when we use “Ed”?
While exploring how “related-block” can be used within a really long entry like p139v_1 (Casting with lead and tin), I realized, another possible way to consolidate the body of a really long entry could be to remove page breaks so that we can have the body as a coherent semantic unit as far as readability is concerned. I believe we can retain page breaks and its informational content through the “folio” tag, but perhaps we do not have to display page breaks on the final output? In order to have semantic coherence for such a layout structure we would then also need an interlinear feature which splits the layout into body and marginalia. When we use an interlinear feature for the layout, one possible problem would be whether marginalia retain their position beside the body of the entry as in each folio. In other words, with an interlinear feature it is important to make sure that marginalia retain their position beside the body of an entry as in the folio. When using the interlinear feature, this can be achieved by minimizing the font of the marginalia as compared to the main body.
The above were some issues I encountered while trying to mark-up some of the longer entries. This week we also learned the fundamentals of transforming xml into markdown. Critical to this endeavour is the document-type-definition file while formulates the structure of each entry. Any changes to the structure of an entry has to be first made in the .dtd file as this allows us to check for well-formedness across all entries. Each xml file is then transformed into markdown by executing a XSLT stylesheet.